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In some metrology applications multiple 
results of measurement for a common 
measurand are obtained and it is necessary 
to determine whether the results agree 
with each other. A result of measurement 
based on the Guide to the Expression of 
Uncertainty in Measurement (GUM) 
consists of a measured value together with 
its associated standard uncertainty. In 
the GUM, the measured value is regarded 
as the expected value and the standard 
uncertainty is regarded as the standard 
deviation, both known values, of a 
state-of-knowledge probability 
distribution. A state-of-knowledge 
distribution represented by a result need 
not be completely known. Then how can 
one assess the differences between the 
results based on the GUM? Metrologists 
have for many years used the Birge chi- 
square test as 'a rale of thumb' to assess 
the differences between two or more 
measured values for the same 
measurand by pretending that the standard 
uncertainties were the standard deviations 
of the presumed sampling probability 
distributions from random variation of the 
measured values. We point out that this is 
misuse of the standard uncertainties; the 
Birge test and the concept of statistical 



consistency motivated by it do not apply 
to the results of measurement based on 
the GUM. In 2008, the Intemational 
Vocabulary of Metrology, third edition 
(VIM3) introduced the concept of 
metrological compatibility. We propose 
that the concept of metrological compati- 
bility be used to assess the differences 
between results based on the GUM 
for the same measurand. A test of the 
metrological compatibility of two results 
of measurement does not conflict with a 
pairwise Birge test of the statistical 
consistency of the corresponding measured 
values. 
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1. Introduction 

To test the proficiency of individual laboratories in 
conducting specific tasks, interlaboratory comparisons 
(ILC) are often used. In ILC between measurement 
laboratories, the task is generally the measurement of a 
common artifact or fractions of the same sample of 
material. To develop a certified reference material, a 
well characterized material is measured by two or more 
methods in one or more laboratories. In both cases 
the data consist of multiple results of measurement 



(measured values with associated uncertainties) of a 
common measurand. To assess the differences between 
two or more measured values for the same measurand, 
metrologists have for many years used a test proposed 
by physicist Raymond T. Birge in 1932 [1]. Birge intro- 
duced the term consistency for lack of significant dif- 
ferences between measured values. The Birge test is 
based on treating the measured values as reaUzations of 
random draws from sampling probability density func- 
tions (pdfs). A sampling pdf models possible outcomes 
for measured values in contemplated rephcations of the 
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measurement procedure in the same conditions. 
Therefore, the consistency of measured values assessed 
by the Birge test is statistical consistency. The Birge 
test applies to uncorrelated measured values only. In 
Sec. 2, we review a concept of statistical consistency 
motivated by the Birge test. The idea of statistical 
consistency belongs to the period when the error 
analysis view of measurements was prevalent. The 
error analysis view of measurements was a hindrance to 
communicating the results of measurement and in 
advancing the science and technology of measurement. 
Therefore leading authorities in the field of metrology 
developed the Guide to the Expression of Uncertainty 
in Measurement (GUM) [2]. According to the GUM, a 
result of measurement consists of a measured value 
together with its associated standard imcertainty. In the 
GUM, the measured value is regarded as the expected 
value and the standard uncertainty is regarded as 
the standard deviation, both known values, of a state- 
of-knowledge probability distribution. A state-of- 
knowledge distribution represented by a result of meas- 
urement need not be completely known. We note in 
Sec. 3 that the Birge test and the concept of statistical 
consistency motivated by it are not applicable to the 
results of measurement based on the GUM. Then how 
can one assess the differences between results based on 
the GUM for the same measurand? In 2008, the 
International Vocabulary of Metrology, third edition 
(VIMS) [3] introduced the concept of metrological 
compatibility of two or more results of measurement 
determined according to the (GUM). In Sec. 4, we 
review the VIMS concept of metrological compatibili- 
ty and propose that this concept be used to assess the 
differences between multiple results based on the GUM 
for the same measurand. In Sec. 5, we show that a test 
of the metrological compatibility of two results of 
measurement does not conflict with a pairwise Birge 
test of the statistical consistency of the corresponding 
measured values. 



2, The Birge Test and Concept of 
Statistical Consistency 

Suppose X, , . . ., x„ are n measured values for a com- 
mon measurand which is believed to be sufficiently 
stable. The Birge test is based on regarding the meas- 
ured values Xi, ..., x„ as reahzations of random draws 
from their presumed sampling pdfs. A sampling pdf 
models possible outcomes in contemplated replications 
of a measurement procedure subject to random effects 
in the same conditions. Therefore, the consistency (lack 



of significant differences between measured values) 
assessed by the Birge test is statistical consistency. The 
Birge test is applicable when the sampling pdfs of the 
measured values Xi, ..., x„ are uncorrelated. The Birge 
test requires knowledge of the variances o^, ..., a/ of 
the sampling pdfs of respectively. Statistical consisten- 
cy of the measured values x,, ..., x„ means that their 
expected values are indistinguishable ' in view of the 
corresponding variances. Specifically, the Birge test 
checks whether the measured values Xi , . . . , x„ may be 
modeled as realizations from normal (Gaussian) sam- 
pling pdfs with unknown but equal expected values and 
known variances O^, ..., cr/. Birge proposed that to 
check the consistency of the measured values Xi , . . ., x,„ 
one can calculate the test statistic 



R^ =^WiiXi-x^Y l{n-\) , 



(1) 



where w, = 1 / of, for ;' = 1 , 2, . . . , «, and x^ = X, w, x,/ 2, w,- 
is the weighted mean of Xi , . . . , x„ . If the calculated value 
of E} is substantially larger than one, then the disper- 
sion of X,, ..., x„ is greater than what can be expected 
from the normal pdfs with equal expected values and 
known variances of, ..., 0,f . In that case the measured 
values Xi, ..., x„ can be declared to be statistically 
inconsistent. 

Statistical interpretation of the Birge test: Birge 
was a physicist and he proposed his test independently 
of and before much of the statistical theory as it is 
known today was estabhshed. However, the Birge test 
of consistency can now be interpreted as a classical 
(sampling theory) statistical test of hypothesis. The 
measured values Xi , . . . , x„ are presumed to have normal 
sampling pdfs with unknown but equal expected values 
and variance-covariance matrix T^ x Diag {of, ..., c/], 
where t^ is an unknown parameter and of, ..., off axe 
known. The null hypothesis Hq is that T ^ < 1 and the 
alternative hypothesis Hi is that T^>1. The null 
hypothesis Hq means that the variances of Xi , . . . , x„ are 
not greater than of, ..., off, respectively. The alterna- 
tive hypothesis H, means that the variances of Xi, ..., 
x„ are greater than of, ..., cr/ [4]. The classical /i-value 
Pq is the maximum probability under the null hypo- 
thesis of realizing in contemplated replications of the n 
measurements a value of the test statistic more extreme 



In statistical literature the term consistency is applied to a statisti- 
cal estimator. A point statistical estimator is said to be consistent 
if it approaches the parameter being estimated as the sample size 
increases. 
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than its realized (calculated) value. The classical 
p-va\ue of a realization of (« - I) R^ is 



p,=?v{x'>(n-l)R'}, 



(2) 



where X^(n i) denotes a variable with the chi-square 
probability distribution with degrees of freedom (« - 1) 
[4]. If the classical /?-value/'c is too small, say, less than 
0.05, then the null hypothesis is rejected with level of 
significance 0.05 or less. A rejection of the null hypo- 
thesis means that the dispersion of the measured values 
Xi, ..., x„ is greater than what can be expected from 
normal distributions for Xi , . . . , x„ with equal expected 
values and stated variances a^^, ..., a/, respectively. 
The dispersion of Xi , . . . , x„ can be greater than expect- 
ed under the null hypothesis because either the vari- 
ances of Xi, ..., x„ are greater than (7i^ ..., c/ or their 
expected values are not equal. If the stated variances 
(7,^, ..., a/ are not questionable then the assumption 
that the expected values of Xi, ..., x„ are equal appears 
to be unreasonable. In that case, the measured values 
Xi , . . . , x„ can be declared to be statistically inconsistent. 

Limitations of the Birge test: A limitation of the 
Birge test is that it is applicable for uncorrected meas- 
ured values X,, ..., x„ only. However, it can be easily 
generahzed to correlated measured values Xi, ..., x„ 
whose covariances denoted by Oi^, ..., cr(„ ,,„ are 
known [4]. The Birge test suggests the following notion 
of the statistical consistency of the measured values 
Xi, ..., x„: The measured values x=(x,, ..., x„)' are 
said to be statistically consistent if their dispersion 
is not greater than what can be expected from the 
normal consistency model which postulates that the 
joint «-variate sampling pdf ofx is normal N(l/i,Z)) 
with unknown expected value 1/i and variance-co- 
variance mafrix D= [cr,y], where 1 = (1, ..., 1)', (7,y is 
the covariance between x, and Xj, and (7,, = (7,^ for 
i,j= 1,2, ...,« [4]. 

Another limitation of the Birge test (and of its gener- 
alized version for correlated measured values) is that it 
is a one sided test of hypothesis which checks whether 
the dispersion of Xi, ..., x„ is more than what can be 
expected from a normal consistency model. A review of 
the Birge test in [5] notes that if the realized value of 
the Birge test statistic R^ is substantially less than one, 
then the stated variances O^, ..., cr/ may well be too 
large. To avoid declarations of statistical consistency 
from overstated variances, the following definition of 
statistical consistency was proposed in [6]. 



Definition of statistical consistency: The measured 
values x= (xi, ..., x„)' are said to be statistically con- 
sistent if they reasonably fit the normal consistency 
model which postulates that the joint «-variate 
sampling pdf of x is normal N(l/i, D) with unknown 
expected value \jJ. and variance-covariance matrix 

This definition requires a different approach for test- 
ing statistical consistency than the Birge test and its 
generahzed version for correlated values. A modem 
method to assess the fit of a statistical model to the data 
is Bayesian posterior predictive checking [6]. Posterior 
predictive checking is a Bayesian adaptation of the 
classical (sampling theory) statistical hypothesis test- 
ing. A function of the data (and possibly unknown 
parameters) called 'discrepancy measure' is defined 
to characterize a potential discrepancy between the 
statistical model and the data. The posterior predictive 
/)-value p^ of adiscrepancy measure T(j:) is the proba- 
bility of realizing in contemplated replications a value 
of the discrepancy measure more exfreme than its 
realized value. If the posterior predictive ;?-value is 
close to zero (or to one) then the fit of the statistical 
model to data is suspect. 

If the measured values Xi , . . . , x„ were uncorrelated, 
then the statistic T^ {x) = {n-\) R^ = S, w, (x, - x^Y is a 
useful discrepancy measure to check the overall fit of 
the normal consistency model N(l/i,Z)) to the meas- 
ured values X,, ..., x„. As discussed in [6, Sec. 2.4], the 
posterior predictive /i-value of the reahzed discrepancy 
measure T^ (x) = (« - 1 ) R^ is 



p,=Vx{x'>{n-\)R'}. 



(3) 



We note that (3) is identical to the classical /i-value 
Pq given in (2). Thus Bayesian posterior predictive 
checking of the discrepancy measure Tj.(x) = {n-\) R^ 
is equivalent to the Birge test of statistical consistency. 

Bayesian posterior predictive checking can be used 
to investigate any number of potential discrepancies 
between the statistical model and the data. To assess the 
difference between two particular measured values 
X, and Xj, the statistic T, y(x) = |x,-Xy| is a useful 
discrepancy measure, for i, j = \,2, ...,n and i^j- 
The Bayesian posterior predictive /i-value of the real- 
ized discrepancy measure | x, - Xj \ is 



p,=Vx{Z> 



Jof+a7^2p~a^ 



(4) 
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where p,y is the correlation coefficient between the 
presumed normal sampling pdfs of x, and Xy ; the covari- 
ance between x, and Xj is Og = Py a^ Oj, and Z denotes 
a variable with standard normal distribution N(0, 1) 
[6, Sec. 3.2]. A posterior predictive ;?-value p^ close to 
zero suggests that the difference between x, and Xj is 
larger than what can be expected from the normal 
statistical consistency model N(l/i,Z)). That is, the 
measured values x, and Xj do not seem to have the same 
expected value and hence they are not mutually 
statistically consistent. 



3. Concept of Statistical Consistency 
Does Not Apply to Results Based 
on the GUM 

A result of measurement determined according to 
the GUM consists of a measured value together 
with its associated standard uncertainty. Suppose 
[xi, m(x,)], ..., [x„, m(x„)] are n results of measurement 
for a common measurand, where Xi, . . . , x„ are the meas- 
ured values and m(x,), ..., m(x„) are the corresponding 
standard uncertainties. According to the GUM, a meas- 
ured value X, and its associated standard uncertainty 
m(x,) represent a state-of-knowledge pdf atfributed to 
the measurand, for / = 1,2, ...,«. Following the GUM, 
we use the symbol Xj for a quantity as well as for a vari- 
able with a state-of-knowledge pdf about the quantity 
J:^ represented by the result [x„ m(x,)], for ;'= 1,2, ...,«. 
The measured value x, is regarded as the expected value 
^{X^ and the standard uncertainty m(x,) is regarded 
as the standard deviation '&{X^ of the pdf of X^, 
for /= 1,2, ...,«. The mainstream GUM requires 
knowledge of only the expected value Ep^) and the 
standard deviation '&{X^ of a state-of-knowledge pdf 
of Xj. The GUM does not require that the state-of- 
knowledge pdf of Xj be completely known. When the 
state-of-knowledge pdfs of Xi, ...,X„ are correlated, the 
correlation coefficients are assumed to be known. 
Following the GUM we denote the correlation coeffi- 
cient R(^, X^ between the state-of-knowledge pdfs of 
Xj and Xj by the symbol r(x„ Xy). Note that {xi, ..., x,,}, 
{m(xi), ...,m(x„)}, and {r(xi, X2), ..., r(X(„. i,, x„)} are 
symbols for known values. 

For many years, metrologists have used the Birge 
test as 'a rule of thumb' to assess the consistency of the 
measured values by freating the squared standard 
uncertainties m^(x,), ..., m\x„) as the known variances 
O^, ..., cr/ of the presumed normal (Gaussian) 
sampling pdfs of the measured values x,, . . ., x„; see, for 
example [8]. The guideline for the analysis of key 



comparisons developed by the BIPM Director's 
Advisory Group on Uncertainties recommends the use 
of Birge chi-square test to assess the consistency of 
measured values by treating the squared standard 
uncertainties as the known variances of the presumed 
sampling pdfs of the measured values [9]. The consis- 
tency of the measured values from CIPM key compar- 
isons and supplementary comparisons is almost always 
assessed using the Birge test [10]. 

The squared standard uncertainties u\x^, ..., u\x„) 
cannot in any logical sense be identified with the 
known variances C7i^, . . . , <J„^ of the presumed normal 
(Gaussian) sampling pdfs of the measured values 
X,, ...,x„. The standard deviation of a sampling pdf 
represents possible dispersion from random variation in 
contemplated rephcations of the measurement pro- 
cedures. A standard uncertainty expresses the dispersion 
of a state-of-knowledge pdf which could be attributed 
to the measurand based on all available statistical and 
non-statistical information. A standard uncertainty 
includes all significant components whether arising 
from random effects or from corrections applied for 
systematic effects. All available statistical and non- 
statistical information is used to evaluate a standard un- 
certainty. In measurements done in high echelon labo- 
ratories, the component of uncertainty arising from ran- 
dom effects is generally a very small part of the com- 
bined standard uncertainty. Treating the squared stan- 
dard uncertainties u\x{), ..., u\x„) determined accord- 
ing to the GUM as the known variances Ci^, ..., <y„^ 
from random variation (in contemplated rephcations of 
the measurements) is a misuse of the standard uncer- 
tainties. Also, as noted earlier, the state-of-knowledge 
pdfs represented by the results [xi, m(xi)], ..., [x„, m(x„)] 
may not be completely known. Therefore the Birge test 
and the concept of statistical consistency motivated by 
the Birge test do not apply to the results of measure- 
ment determined according to the GUM. 



4. VIM3 Concept of Metrological 
Compatibility Applies to Results 
Based on the GUM 

A measured quantity value [3, definitions 1.19 and 
2. 1 0] is a product of a numerical value and a measure- 
ment unit. The measurement unit implies that the meas- 
ured value is fraceable to a reference for that measure- 
ment unit. A result of measurement (measured value 
together with its associated standard uncertainty) is 
fraceable to a reference only if the result can be related 
to a practical realization of that reference through a 
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documented unbroken chain of calibrations each 
contributing to the measurement uncertainty [3, defini- 
tion 2.41]. Two or more results of measurement are 
metrologically comparable only if they are traceable to 
the same reference [3, definition 2.46]. Metrological 
comparability does not imply that the measured values 
have similar magnitudes. Thus, for example, distance 
between my apartment and my office expressed in 
meters is metrologically comparable to the distance 
between my apartment and the moon also expressed in 
meters. The concept of metrological compatibility 
discussed in the next section applies only to those 
results of measurement for a common measurand 
which are metrologically comparable. That is, the 
results must be traceable to the same reference. 

The concept of statistical consistency can be applied 
to any set of numerical values which have similar 
magnitudes. They do not have to be measured values. 
Thus, for example, one can test statistical consistency 
of deviations (or relative deviations expressed as 
percentage) from a benchmark value. Although a 
metrologist is expected to assess consistency of only 
those measured values which have the same mea- 
surement unit, it is not a requirement of statistical 
consistency. 

All n results [xj, m(xi)], ..., [x,„ m(x„)] for a common 
measurand must be traceable to the same reference for 
them to be metrologically comparable [3, definition 2.46]. 
The VIM3 concept of metrological compatibility is 
defined for two results of measurement at a time. The 
following definition is an elaboration of the succinct 
definition given in VIM3 [3, definition 2.47]. 

Definition of metrological compatibility: Two 
metrologically comparable results [xi, m(xi)] and 
[x2, m(x2)] for the same measurand are said be metro- 
logically compatible if 



f(x-X2)= 



^u^ {Xi)+ u' {x^) -2r{xi, X2)u{x\)u{x2) 



<K, 



(5) 



for a specified threshold k, where r(xi, x^ is a symbol 
for the correlation coefficient 'R{X-^,X^ between the 
variables Xi and X^ . The quantity in the denominator of 
(5) is the standard deviation of the state-of-knowledge 
pdf for X, -Xj, which may be incompletely deter- 



mined. When the pdfs represented by [xi, m(xi)] and 
[xj, "(xj)] are uncorrected, then R(Xi, X^ = and (5) 
reduces to 



CCXj-X^): 



Ju^(x^) + u^{x^) 



<K. 



(6) 



A set of metrologically comparable results 
[x,, m(X|)], [x2, u(x2)], ..., [x„, m(x„)] for the same mea- 
surand is said be metrologically compatible if for every 
one of the n(n- l)/2 pairs of results [x„ m(x,)] and 
[xj, u(xj)] we have 



^(x-x )= 



Ju^(x^) + u^(x ) — 2r(x , X )m(x)m(x ) 



<K, 



(7) 



for a specified threshold k: [3, definition 2.47]. The 
VIM3 does not discuss how the threshold k should be 
determined. A conventional value of k is two. 

The concept of metrological compatibility can be 
used to assess the differences between the results of 
measurement based on the GUM for the same measur- 
and. The concepts of metrological comparability 
and compatibility do not require that the state-of- 
knowledge pdfs represented by the results [xi, m(xi)], 
[xj, u(x2)], ..., [x„, m(x„)] be completely known. 
Thus they fit the GUM. When the set of results 
[x,, m(x,)], ..., [x,„ m(x„)] is metrologically compatible, 
we can say that the differences between the measured 
values Xi, ..., x„ are insignificant in view of the uncer- 
tainties m(xi), ..., m(x„). 

To assess metrological compatibility of results based 
on the GUM using the criteria (5), (6), or (7), the 
threshold k: needs to be specified. A proper choice of k 
is to a large extent a matter of agreement because it 
requires accepting the economic consequences of that 
choice. Although a conventional value of k is two, 
depending on the application, the interested parties 
could agree on a different value for k. Once the value 
of the threshold k is set the conclusion of a test of 
metrological compatibility based on the VIM3 defini- 
tion is dichotomous, either a set of results is metrolog- 
ically compatible or incompatible. The concept of 
metrological compatibility is being used by metrolo- 
gists who are familiar with it; see for example [11, 12]. 
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The VIM3 definition of metrological compatibility 
can be easily extended to metrological compatibility of 
a set of results and a reference result [x^, u(xj^)], where 
Xr is the reference value with standard uncertainty 
u(xj^). Suppose the pdfs represented by the measure- 
ment results are uncorrected with the pdf represented 
by the reference result. A set of results [xi, m(x,)], ..., 
[x,„ m(x„)] metrologically comparable with a reference 
result [xr, m(xr)] is compatible if 



Cix,-x-^)- 



Ju^(x.) + M^(Xj,) 



<K, 



(8) 



for / = 1, 2, ..., « [13]. Similarly a set of results 
[xi, m(x,)], ..., [x„, m(x„)] metrologically comparable 
with a combined result [x^, m(xc)], where x^ is the 
combined value (such as arithmetic mean or a weighted 
mean) with standard uncertainty m(xc) is compatible if 



^(x-Xc)= 



^M^(x,.)-I-M^(x^)— 2r(x,, x^)u(x.)u(X(.) 



<K 



(9) 



where r(x,, x^) denotes the correlation coefficient 
between the pdfs represented by [x,, m(x,)] and 
[xc, m(xc)], for /= 1, 2, ..., n [13]. 



5. Concluding Remarks 

For many years, metrologists have used the Birge 
chi-square test as 'a rule of thumb' to assess the dif- 
ferences between two or more measured values for 
the same measurand by pretending that the squared 
standard uncertainties were the known variances of the 
presumed normal sampling pdfs of the measured 
values. This is misuse of the standard uncertainties 
based on the GUM. The Birge test and the concept of 
statistical consistency do not apply to the results of 
measurement based on the GUM. As discussed in this 
paper, the VIM3 concept of metrological compatibility 
can be used to assess the differences between the 
results of measurement determined according to the 
GUM. Thus metrologists can start using the VIM3 con- 
cept of metrological compatibility in place of the Birge 
test to assess the differences between multiple results of 
measurement of the same measurand. 

The following is a pertinent question. Could the con- 
clusions (about mutual agreement of results) based on 



the VIM3 concept of metrological compatibility and 
the Birge test (based on treating squared standard 
uncertainties as the known variances of sampling pdfs 
of measured values) differ? It is difficult to directly 
compare the Birge test and a test of metrological com- 
patibility because the former is defined for an arbitrary 
positive integer « > 1 and the latter is defined for only 
two results at a time. For pairwise comparisons (« = 2), 



the Birge test statistic R^ -■ 
reduces to 



S,w,(x,-Xw) /(«- 1) 



R' 






(10) 



which is square of (xi - x^N{Oi^ + O-^)- Under the null 
hypothesis that the presumed normal sampling pdfs of 
X, and Xj have the same expected value, the 
distribution of (xi -X2}l^{(y^ + Gi) is normal N(0, 1). 
Therefore when « = 2, the normal distribution can be 
used to assess the absolute difference [xi-Xj]. The 
square of a normal N(0, 1) variable has the chi-square 
distribution x^m with degrees of freedom 1 . Therefore 
the square of the (1 - a/2) x 100-th percentile 
Z[i-a/2] of normal N(0, 1) distribution is equal to the 
(1 - a) X 100-th percentile X^mn - «] ^^I^m distribution. 
Thus the realized value of (10) being less than X^m\_\ -a] 
is equivalent to the ratio {x^-x^N{o^ + O2) being 
less than z^i „/2]- It follows that declaration of Birge 
statistical consistency when the classical /i-value pc of 
the Birge test (2) is less than 0.05 (for example) is 
equivalent to the reahzation that 



V(^7W) 



<z, 



[0.975] 



:1.96»2. 



(11) 



We note from (6) and (11) that if the threshold xrfor 
metrological compatibility is set as k:= 2 then the con- 
clusion of a check of metrological compatibility 
between a pair of results [x,, m(xi)] and [xj, m(x2)] 
would be identical to the assessment of statistical con- 
sistency between Xi and Xj based on the Birge test by 
(wrongly) treating m^(x,) and u^{x^ as (Ti^andcTj^, 
respectively (and treating the correlation coefficient 
R(Xi, X2) as P12 which is zero in the Birge test). 
Therefore a pairwise Birge test of statistical consisten- 
cy and a test of metrological compatibility do not 
conflict. 
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